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Designing the Boards new literature Achievement Test 



To construct a nonessay literature test that measures both sharpness 
of observation and depth of understanding, yet is still fair to 
students of varied backgrounds, is less easy than you may have guessed. 



This May, the College Board offered 
for the first time a one-hour, multiple- 
ehoice Achievement Test in English 
literature. In effect, the new test is an 
outgrowth of the recommendations of 
the Committee of Review for the Ex- 
aminations in English, appointed by 
Board President Pearson in 1964 to 
review the Board tests in English in 
the light of the findings of the Com- 
mission on English and the testing 
needs for admissions and placement. 
A major recommendation of this com- 
mittee was that the Boards English 
examimdions should reflect the tripar- 
tite nature-language, literature, and 
composition— of most English curricu- 
buns today. 

Work on the test began in October 
1965, when Board members author- 
ized the BoareTs Committee on Exam- 
inations to develop an experimental 
two-hour test of both English compo- 
sition and literaUire. The Committee 
reported in March 1967 that prelimi- 
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nary experiments had shown the com- 
bined two-hour test to be neither 
psychometrically nor administratively 
feasible, but that the objective portions 
of the literature section seemed usable 
as a one-hour test. With the approval 
of the Boards s trustees, the first ad- 
ministrations of the new test were 
scheduled for this May and for twice 
next year. 

In this article, one of the tesfs de- 
signers details the ** gripping saga** of 
ho*v the literature committee* of the 
Committee on Examinations created 
the new test: what objectives were con- 
sidered, how they were evaluated, what 
kinds of questions were finally selected, 
and why. 

The literature committee was origi- 
nally charged by the College Board** 
Committee of Review for the Examina- 
tions in Englbh with creating a test to 
meet four objectives: 1) to measure 
the breadth of a students reading; 2) 
to measure his understanding in some 
dq>th of works he has studied and 
read; 3) to measure hb response to 
literature, jointly affective and eval- 
uative ; and 4) to measure his ability 
to use whatever critical skill he has on 
texts unfamiliar to him. 

These four objectives include most 
of those listed in statements on second- 
ary school literature teaching. Omitted 
are the attitudinal and the broad, hu- 
manistic objectives which are long- 
term goals rather than immediate 
rewards. Each of the four presents cer- 
tain measurement problems, particu- 
larly within the hour-long format of 
College Board Achievement Tests. My 
hope is that this gripping saga of the 
committee*s struggle with these prob- 



lems will be of use to the creator of 
classroom tests. 

The first objective, to measure the 
breadth of a student*s reading, seems 
easiest, particularly with multiple- 
choice questions. What could be sim- 
pler than: Duncan is Macbeth’s (a) 
son; (b) brother; (c) king; (d) mur- 
derer; (b) conqueror? Such a ques- 
tion can separate those who remember 
the play from those who do not. It is 
unimpeachable. And when taken by it- 
self, it is unfair. 

Macbeth is the most-taught work in 
American high schools— about 70 per- 
cent of them include it in their curric- 
ulums. But what about the other 30 
percent? When you go beyond Silas 
Marner, Romeo and Juliet, Huckle- 
berry Finn, and similar pedagogical 
bestsellers, you will find that not more 
than about 10 percent of the schools 
assign any single title. Particular short 
stories and lyric poems are used by 
even fewei schools. To give every stu- 
dent a fair shake, a test of literary 
breadth should include at least 100 
questions, and even then the examiner 
gains less an index of achievement 
than of mere acquaintance. 

The second objective, **to measure 
understanding in depth** of works 
studied by the student, poses a differ- 
ent sort of problem. Such a measure 
requires an essay answer, because a set 
of multiple-choice questions cannot be 
made up for every book a student 
might have read, and a universal series 
of such questions is absurd. Essay 
questions do exbt in the Advanced 
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Placement English Examination, but 
even they run into practical snags that 
would take on immense proportions in 
a really large-scale examination. 

The first snag is that of differing in- 
terpretations. If, in all honesty, some 
benighted candidate wrote an interpre- 
tation of Macbeth which made Duncan 
a symbol of France, Macbeth of Ho 
Chi Minh, and Macduff of Henry 
Cabot Lodge, most responsible graders 
would color him dense. But it might be 
the interpretation he was taught. The 
second snag is, once again, that of va- 
riety. Few Advanced Placement stu- 
dents choose Leon Uris as their author 
I of literary merit; he isn’t assigned in 
Advanced Placement classes. But his 
work is often assigned in other classes, 
and a student cannot be blamed for 
writing about the theme of man and 
society as exemplified, for instance, in 
Exodus. 

The third snag, that of sampling, is 
the biggest one. The Advanced Place- 
ment Examination is three hours long. 
This means the candidate can write at 
least Uvo essays of this sort as well as 
answer other questions. An inappro- 
priate choice, a misunderstanding of 
the question, or a dubious interpreta- 
tion in one essay can be atoned for in 
the other. In an hour-long essay test, 
however, the number of targets at 
which to shoot is limited, and the 
candidate cannot demonstrate his abil- 
ity as fully as he might. 



Mastery and response: untestable? 

In addition to these three snags, 
diere is a theoretical criticism of the 
importance of the goal itself. It main- 
tains that you teach a particular work 
—Macbeth, for instance— less so that 
students may become masters of Mac- 
beth than that later diey may be 
equipped to deal with Hamlet. The 
goal of understanding in depth, then, 
is only a way station to a larger goal. 
In any case, the committee decided 
that the circumstances under which 
the new test was to be introduced mili- 
tated against measuring the objective 
of understanding in depth alone. How- 
ever, a brief essay on a Look of the 
candidate’s choice still could be paired 
with objective three or four. 

Objective three, “to measure his re- 
sponse to literature, jointly affective 
and e\'aiuative.” deals with one of the 
mnst^mportant goals of instruction in 
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literature. If students aren’t moved by 
books and have not developed " K ic 
for them, then English teachers have 
failed. Everything else is merely a 
means to this goal, the achievement of 
which will determine the reading hab- 
its and the very being of the student. 
How then to measure response? Should 
the candidate report on his enjoyment 
of what he is reading? Should his 
taste be measured against some norm? 
Should we take an electroencephalo- 
graph? 

The answer to the question of meas- 
uring affective response is that we 
don’t know yet how to do it w'ell 
enough. In part, we do not trust the 
students to be honest; often enough we 
ourselves aren’t honest, particularly 
when we teach something for the twen- 
tieth time— Si/os Mamer, for instance 
—simply because the school board 
won’t buy another set of books until 
the old ones wear out. Besides, affective 
responses differ in nature, and no sin- 
gle criterion exists. We want students 
to be caught up in books, but at the 
same time we want them to be detached 
observers and interpreters. The bal- 
ancing point is hard to find. 

How about testing for a measure of 
taste? One might use 20 passages— 10 
judged good, 10 bad, by a panel of 
teachers— and ask students to rank 
them. Experiments with this sort of 
test have shown, though, that more 
often than not the students make poor 
choices. Besides, what teacher would 
be willing to do more than make a 
rough distinction between Shakespeare 
and Neil Simon, between Wiliam 
Faulkner and Mickey Spillane? Tlie 



middle ground on which John O’Hara 
and John Updike reside b full of 
quicksands. A taste test is intriguing 
dynamite but, unfortunately, inappro- 
priate as a College Board Achievement 
Test. A measure of the premises of 
taste might have a better chance of 
success, but there are still problems. 
Whether a poem is good primarily be- 
cause of its form, its theme, or its 
effect on the reader b moot: what 
counts b the persuasiveness of the 
argument for using any one of these 
criteria in a particular case. In effect, 
such a test would be more a test of 
composition than of literary skill. 

The last objective, “to measure hb 
ability to use whatever critical skill he 
has in texts unfamiliar to him,” abo 
presents problems. Critical skill may 
be said to consbt of the ability to per- 
ceive the parts and the relationship of 
parts to each other and to the whole 
of a work; the ability to devise a co- 
herent interpretation of the work; the 
ability to perceive the effect of the 
work on the reader’s response; and the 
ability to make a reasoned judgment 
about the work. The problem in meas- 
uring thb goal b the tentativeness you 
must assume in asserting anything 
definitive about a person. 

Wth some degree of assurance, you 
can say that a person who can answer 
six subtraction problems using three 
columns can answer most subtraction 
problems. But assurance melts away 
when you discover that a student can 
answer six questions on imagery, 
rhythm, substance, tone, structure, 
and point of view in a lyric by Wiliam 
Butler Yeats. Can he do the same with 




another poem by \eats, let alone with 
one by a poet like Robert Herrick or 
Ben Jonson? We know that students 
who have cut their critical teeth on 
John Donne, Andrew Marvell, and the 
early T. S. Eliot are often incapable of 
talking or writing about Walter Landor, 
U^lliam Cowper, or Alfred, Lord 
Tennyson. This is not to say that they 
are not good students of literature, 
but that they have become so absorbed 
in the explication of difficult works 
that they have trouble appreciating 
simplicity. Other students may be put 
off by certain words in a poem. After 
all, that is what literature is— a quick- 
silver art that changes with each read- 
ing without losing its weight and mass. 

Anyone measuring this objective, 
therefore, must consider the unique- 
ness of the work and the response to it, 
and the tentativeness of uny assertion 
about achievement. Yet this objective, 
the developed ability to apprehend a 
new literary work, is the chief one un- 
derlying most literature curriculums— 
be they historical, thematic, or generic. 
Further, the problems of measurement 
are ones that can be borne more easily 
than can the problems involved in each 
of the other objectives. For this rea- 
son, as well as for the reason that a 
measure of critical ability would in- 
fluence secondary school curriculums 




positively and would not hamstring 
them, as might tests of literary ac- 
quaintance or tests on set books, the 
literature committee decided that the 
Achievement Test in literature should 
“seek knowledge about a student’s 
ability to comprehend, analyze, and 
evaluate literary works. It will focus 
on passages in the various genres, 
drawn primarily from English and 
American literature.” 

The eight skills finally chosen 

It was decided that the test questions 
should seek to measure a student’s 
ability : 1 ) to paraphrase parts of the 
work or summarize the whole work; 
2) to comprehend the structure of the 
text; 3) to comprehend language and 
style; 4) to comprehend rhetorical and 
literary devices; 5) to comprehend the 
ways by which structure and language, 
and rhetorical and literary devices en- 
hance and even create the meaning and 
form of a work ; 6) to classify a text by 
genre, tradition, or period; 7) to un- 
derstand allusions to common figures 
or sypibols in mythology, literature, 
and folklore; and 8) to deal with such 
general aspects of literary study as 
ffieme, history, and the writer’s art. 

The last of these abilities was in- 
cluded so that the committee might 
feel free to add an essay portion to 
the test should it deem such an addi- 
tion worthwhile. For the foreseeable 
future, however, the test will consist 
of multiple-choice questions, all of 
them directed at specific passages— 
prose, poetry, or drama— presumably 
unfamiliar to the candidate. In the 
hour-long test, there will be from 6 to 
12 passages varying in length, genre, 
form, and historical period. This num- 
ber of passages should provide a suffi- 
cient aggregate to enable one to assert 
that a candidate who does well on the 
test can handle not any text that might 
come along, but a goodly number of 
them. Such a candidate certainly 
should be acknowledged as a compe- 
tent reader of literature, and therefore 
that much more eligible for admission 
to a liberal arts college. 

What of the questions, however? Can 
any searching questions be cast into 
the multiple-choice mold? The answer 
depends on what you mean by “search- 
ing.” Certain limitations are imposed 
on the questions. Let me take as an 
example the sample questions on 



“Medusa” by Louise Bogan, in the 
box on the next page (asterisks indi- 
cate correct answers) . 

These 10 questions represent the sort 
that will be asked on the new test. The 
first asks the source of an allusion, the 
second the form of the work. Both re- 
quire factual information that the stu- 
dent should be able to bring to the 
poem. (Other appropriate factual 
questions might be about approximate 
date, or in some cases the probable 
author.) Questions 3, 4, and 5 are also 
relatively factual, but deal with in- 
ternal aspects of the poem. Numbers 
3 and 4 ask about the effect of the 
language and ask the student to re- 
ject an implausible connotation or an 
unimportant point about structure. In 
question 5, the student is asked to con- 
trast the details in two parts of the 
poem. The last of these can be verified 
empirically: the first two depend on 
the student’s knowledge of semantics 
and of the concept of organicism in a 
work of art (that certain aspects of 
linguistic form are coincidental and 
others are meaningful) . 

The first four questions, then, are 
relatively straightforward and empiri- 
cally verifiable, given a basic consen- 
sus about the nature of poetic language 
and form. But you may argue that 
these are not searching, and you would 
be ri^t; yet to answer them correctly 
is to display a basic knowledge of what 
literary language is all about. 

Three measures of discernment 

Questions 6, 7, and 8 move beyond 
this point. They ask students to make 
summary' statements about the rela- 
tionships between parts of the poem 
(question 6), about the tenor of a 
metaphor (question 7), and about the 
functional effect of stylistic devices 
(question 8). The first of these ques- 
tions calls upon an ability to make 
generalizations about the content, the 
second to comprehend the relationship 
of resonant language to the work as a 
whole, and the third to relate a stylis- 
tic device— in this case aberrant versi- 
fication— to the pace of the poem. This 
last question goes a bit further than 
the others in that it asks students to 
reject various overinterpretations of 
stylistic devices. It is perhaps a danger- 
ous question, but 1 think its incorrect 
options can all be dismissed as periph- 
eral at best, misguided at worst. 
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the poem as a %vhole and the other on 
certain of its facets. They do not go 
beyond that %vhich b verifiable by ref- 



erence to the verbal context. (Other 
questions might have been asked: on 
the relation between Medusa and a 



These three questions, then, go be- 
yond the factual and ask students to 
read discerningly, keeping one eye on 
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MATERIAL BY MICROFICHE ONLY HfS BEEN GRANTED 

TO ER 1C AND Organizations operating under 

AGREEMENTS WITH THE U. S. OFFICE OF EDUCATION. 
MEDUSA FURTHER REPRODUCTION OUTSIDE THE ER 1C SYSTEM 
REQUIRES PERMISSION OF THE COPYRIGHT OWNER." 

I had come to the house, in a cave of trees, 

Facing a sheer sky. 

Everything moved,— a bell hung ready to strike, 

Sun and reflection wheeled by. 

When the bare eyes were before me 
And the hissing hair. 

Held up at a window, seen through a door. 

: The stiff bald eyes, the serpents on the forehead 
Formed in the air. 

This is a dead scene forever now. 

Nothing will ever stir. 

[ The end will never brighten it more than this. 

Nor the rain blur. 

1 The water will always fall, and will not fall. 

And the tipped bell make no sound. 

The grass will always be growing for hay 
Deep on the ground. 

i And I shall stand here like a shadoAv 
Under the great balanced day, 

; My eyes on the yellow dust that was lifting in the wind, 

\ And does not drift away. 

Reprinted from Collected Poems by Louise Bogin by permission of Farrar, 
Straus & Giroux, Inc. Copyright 1954 By Louise Bogan. 

I 1. Medusa b the name of 

(a) a sea nymph 

(b) the muse who was the inspiration of lyric poets 
; (c) the girl who led Theseus through the labyrinth 

(d) a woman who turned those who looked at her to stone 

(e) the girl who married Jason after helping him find the 
Golden Fleece 

^ 2. This poem is written in 

(a) free verse 

(b) heroic couplets 
' (c) sonnet form 

I (d) irregular quatrains 
; (e) blank verse 

3. In stanza 2, depicting Medusa’s eyes as “bare,” “stiff,” and 
“bald” does all of the following except 

(a) indicate that the speaker only imagines her effect on him 

(b) prepare for the description of the speaker’s state in stanzas 
3-5 

(c) emphasize her inhuman appearance and its effect on the 
speaker 

: (d) create a contrast with “moved” and “wheeled” in stanza 1 
(e) show that her eyes are a prime force behind her spell 

4. The contrast in effect between stanza 2 and stanzas 3, 4, and 
S is produced by all of the following devices in stanza 2 ex- 
cept the 

(a) single-syllable rhyme, “air” in line 6 and “hair” in line 9 

(b) lack of a finite verb in the sentence in lines 8-9 

‘ (c) lack of a main clause in the sentence in lines 5-7 
I (d) four short phrases in lines 7-8 

O 
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(e) wo . “When” line 5 

5. Which of the following is true of the description in lines 16-17 
but not of the description in lines 10-15? 

(a) Lines 16-17 contain references to natural events. 

(b) There is a lack of action in lines 16-17. 

(c) Events described in stanza 1 are repeated in lines 16-17. 

* (d) In lines 16-17, there is a possibility that Medusa’s spell is 

incomplete. 

(e) In lines 16-17, there is a reference to the future like the ref- 
ences in stanza 2. 

6. The difference between the scene in stanza 1 and that in 
stanzas 3, 4, and 5 is primarily the difference between 

(a) joyful events and sad events 

(b) a realistic scene and a scene in a play 

(c) a scene on earth and a scene in hell 

(d) action that did happen and action that will happen 

* (e) action and arrested action 

7. By comparing himself to a “shadow” (line 18), the speaker 
' indicates that * 

* (a) in this scene his own reality is dubious 

(b) his consciousness has been destroyed by his experience 

(c) the sky has grown brighter than it was when the poem 
began 

(d) he thinks he will disappear when the daylight finally ends 

(e) he has died and become a ghost because of Medusa’s spell 

8. Wliat is the function of the extra line and more rapid move- 
ment in stanza 2? 

(a) To suggest that Medusa was a vigorous and active creature 

(b) To indicate that the speaker moved quickly when he was 
frightened 

* (c) To focus attention on the central experience of the poem 

(d) To show that the central experience lasted longer than 
others in the poem 

(e) To help the reader to see that the poem is like an old ballad 

9. The events in the first stanza are told in the past tense; the 
events in the third, fourth, and fifth stanzas are told in the 
present and future tenses. Which of the following is the best 
inference one can draw from this fact? 

(a) The first scene happened in the speaker’s past; the second 
scene is an attempt to relate what happened after the speak- 
er left. 

(b) The first scene happened when the speaker was alive; the 
second scene occurs after death. 

(c) The first scene is perceived by the speaker; the second scene 
is perceived by Medusa. 

(d) The first scene is real; the second scene is a metaphoric in- 
terpretation of it. 

* (e) The first scene occurred in the speaker’s life; the second 

scene occurs in his mind. 

10. The poem as a whole deals with 

(a) encountering and killing the mythical Medusa 

* (b) experiencing an unnamed shock as powerful as Medusa’s 

spell 

(c) meeting someone as horrible to look at as Medusa was 

(d) enjoying the idea that the myth of Medusa is not true 

(e) thinking about how Medusa would look if she were real 



On a multiple-choice testy interpretwe questions 
must be limited in their scope and in 
their penetration or risk being untenable. 



house; on the function of “stir,” 
“brighten,” and “blue”; on the sum- 
mary nature of the word “balanced” 
in line 19; or on the mood of the 
speaker in the last two stanzas.) All of 
these call for accurate description of 
the words in the context of the poem. 

How do you test interpretation ? 

It might be argued that objective ques- 
tions cannot go beyond this point, 
but questions 9 and 10 do, since both 
ask for interpretations of the poem or 
its parts. Question 9 asks for the best 
inference to be drawn from an obser- 
vation of the poem’s structure. (Op- 
tion B can’t be totally rejected, per- 
haps, but E is better, if only because 
[ poetic speakers do not usually speak 
I from the grave. Options A and C are 
i clearly unsubstantiated, and option D 
^ violates much of the logic of the 
i poem.) In a similar way, question 10 
I asks for the best summary of the poem. 

[ .(Clearly option B is most consonant 

’< with the statement and mood of the 

[ 

poem; only option c has a plausible 
ring, but it is conjectural at best and 
finally insupportable because there is 
really litde en j oyment in the poem. ) 

Of these two questions, 9 is the 
more searching, but it would seem that 
a question like 10 could be made more 
penetrating and still have one clearly 
acceptable option. For instance, sup- 
pose that the word in option D was 
“contemplating” instead of “enjoying” 
the idea that the myth of Medusa is 
not true. It might not be as good an 
answer as B, but the word contemplat- 
ing is ambiguous enough to render 
the option acceptable. The same might 
be said of option D if it were changed 
to “encountering and being petrified 
by the mythical Medusa.” This latter 
statement, while not the poem’s sub- 
ject matter, does underlie it. In any 
case, the wrong options for this sort 
of question must be clearly inadequate 
to the poem. 

Thus, questions of interpretation 
t must be limited in scope and penetra- 
tion. The following two questions pro- 
vide a good example of these limits: 

me 



Both are good discussion questions but 
would be untenable on a multiple- 
choice test. 

The first one asks: Which of the fol- 
lowing does not describe the overall 
progression of the poem? (a) from 
ignorance to knowledge ; (b) from life 
to death; (c) from activity to paraly- 
sis; (d) from time to tiinelessness; 
(e) from happiness to despair? This 
is a favorite sort of question in liter- 
ature, asking the student to see move- 
ment in the poem and then to define it 
—here negatively. The first four terms 
serve as partial definitions and, in 
fact, supplement each other. The 
adroit humanist can see that paralysis 
and deadi have much in common, as 
do knowledge and timelessness. These 
four, then, set the limits for the state 
the poem is describing and for the 
speaker who is describing it. The fifth 
option stands out as dealing with a 
different dimension of the poem— that 
of mood— and in that respect is a bad 
question. But you could not make a 
fifth and incorrect option that would 
not either stand out prominently or 




retain some defensible points ; “action 
and inaction,” “past and future,” 
“myth and reality,” “chaos and order” 
are all defensible; “deatli and rebirth,” 
“poverty and richness,” “sea and 
land,” “stone and wood” are all ridic- 
ulous. In short, once the premise that 
the poem has a certain kind of reso- 
nance (from activity to inactivity and 
from ignorance to knowledge) is 
granted, any number of further reson- 
ances are equally viable. The question, 
then, is a good teaching question, but 
not a good question for objective as- 
sessment. 

A second untenable question asks: 
Which of the following words gives 
the first suggestion that something un- 
usual is going to happen? (a) “had” 
(line 1); (b) “cave” (line 1); (c) 
“sheer” (line 2) ; (d) “hung” (line 
3) ; (e) “wheeled” (line 4). This ques- 
tion presents a similar problem, for a 
poem is by definition an organic unit, 
and an argument can be made that 
every word has a potency equal to any 
other. “Cave” does stand out as a 
modifier of the image, but “had” by 
virtue of its tense also seems to estab- 
lish an ominous condition. The sensi- 
tive reader might agree that one word 
is more suggestive than another, but 
the difference is simply of degree. 

Both of these questions indicate 
some of the limits involved in multiple- 
choice testing of the understanding 
and appreciation of literature. The 
literature committee and its consult- 
ants are constantly concerned with 
these limits and are always in search 
of ways by which an objective test can 
probe more deeply into the student’s 
response. Other experiments have 
been carried out using questions that 
ask for best and second-best answers, 
that ask a student to judge the validity 
of certain responses, and that try to 
assess his ability to judge the appropri- 
ateness of various suggestions for 
omitted lines from a poem. 

All of these experiments seek to 
make the test of literary acumen one 
which finds out how deeply students 
examine a poem, by following a pat- 
tern of questions similar to those 
asked about “Medusa.” This pattern 
attempts to move from a test of sharp- 
ness of observation to a test of depth 
of understanding, the two qualities 
which seem to mark the mature reader 
of literature. 



